[Anthropic] Support system role messages inside messages array#44283
Conversation
Co-Authored-By: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-Authored-By: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
This looks better, I'm closing my PR as it's not needed anymore |
|
/cc @DarkLight1337 @sfeng33 PTAL. |
…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
| system_parts.append(block.text) | ||
|
|
||
| # System messages embedded inside the messages array | ||
| for msg in anthropic_request.messages: |
There was a problem hiding this comment.
@chaunceyjiang @aleksandaryanakiev @sfeng33 @andrew @potatosalad I'm a bit concerned about the system role fix. It seems like merging a mid-conversation system:role message into a single system message could cause issues with KV-cache hits. In multi-turn conversations, this would likely change the prefix, potentially hurting cache reuse.
There was a problem hiding this comment.
Yes, I have also observed this issue. The fix here is not correct. I am trying a new solution.
There was a problem hiding this comment.
@chaunceyjiang
OK, I also have an idea here. Later, I will prepare a Merge Request for you. You can check if it meets your requirements.
There was a problem hiding this comment.
@chaunceyjiang @aleksandaryanakiev @sfeng33 @andrew
please check this merge request: https://github.com/vllm-project/vllm/pull/44602/changes
…ching PR vllm-project#44283 merged all inline system:role messages into a single leading system message, which changes the conversation prefix and breaks KV-cache hits in multi-turn dialogues. This fix keeps inline system messages at their original position: - Remove inline system extraction from _convert_system_message (only top-level system is handled there) - In _convert_messages, handle system messages with a dedicated _extract_system_text helper that strips billing headers and only emits the message if real content exists — avoiding the _convert_block / _convert_message_content path which does not strip billing headers and may omit the "content" key - Add tests for billing header stripping on inline system messages Unlike vllm-project#44048 which moves the same merge logic to the protocol layer, this approach fundamentally avoids the prefix-breaking merge entirely. Co-authored-by: Hermes Agent
|
I noticed the prefix caching concern discussed here. I opened #44602 with an alternative approach that preserves inline |
…ching PR vllm-project#44283 merged all inline system:role messages into a single leading system message, which changes the conversation prefix and breaks KV-cache hits in multi-turn dialogues. This fix keeps inline system messages at their original position: - Remove inline system extraction from _convert_system_message (only top-level system is handled there) - In _convert_messages, handle system messages with a dedicated _extract_system_text helper that strips billing headers and only emits the message if real content exists — avoiding the _convert_block / _convert_message_content path which does not strip billing headers and may omit the "content" key - Add tests for billing header stripping on inline system messages Unlike vllm-project#44048 which moves the same merge logic to the protocol layer, this approach fundamentally avoids the prefix-breaking merge entirely. Co-authored-by: Hermes Agent Signed-off-by: felix0080 <felix0080@users.noreply.github.com>
…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: JisoLya <523420504@qq.com>
…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
|
Hello! I found that the current Anthropic streaming response does not include the I suggest adding role="assistant"between the two lines below. I tested this change locally, and it fixed the issue without affecting the normal Claude Code experience: vllm/vllm/entrypoints/anthropic/serving.py Lines 657 to 658 in 469f3dc Since I'm not familiar with the vLLM contribution process, and the issue I found appears to be related to this existing PR, I hope someone from the community can help implement or review this simple fix. |
## What this PR does / why we need it? Backports vLLM PR #44283 via a vllm-ascend platform monkey patch for the pinned /vllm-workspace/vllm runtime. The patch accepts `role: system` entries in Anthropic Messages API `messages`, merges inline system content with the top-level `system` prompt, strips Claude Code billing headers in both places, and skips inline system entries when converting the remaining chat history. Fixes vllm-project/vllm#44000 Backports vllm-project/vllm#44283 ## Does this PR introduce _any_ user-facing change? Yes. Anthropic-compatible `/v1/messages` requests from newer Claude Code clients can include `role: system` messages inside the `messages` array without failing validation. ## How was this patch tested? - `pytest -q tests/ut/patch/platform/test_patch_anthropic_system_message.py` - `ruff check vllm_ascend/patch/platform/patch_anthropic_system_message.py tests/ut/patch/platform/test_patch_anthropic_system_message.py vllm_ascend/patch/platform/__init__.py vllm_ascend/patch/__init__.py` - `ruff format --check vllm_ascend/patch/platform/patch_anthropic_system_message.py tests/ut/patch/platform/test_patch_anthropic_system_message.py vllm_ascend/patch/platform/__init__.py vllm_ascend/patch/__init__.py` - vLLM version: v0.20.2 - vLLM main: vllm-project/vllm@9090368 Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
…project#44283) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com> Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
Purpose
[Anthropic] Support system role messages inside messages array
FIX #44000
Test Result
before

after

Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)